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] Abstract. Commonly used methods to analyze incomplete longitudi- 

^ ' nal clinical trial data include complete case analysis (CC) and last 

observation carried forward (LOCF). However, such methods rest on 
strong assumptions, including missing completely at random (MCAR) 
for CC and unchanging profile after dropout for LOCF. Such assump- 
tions are too strong to generally hold. Over the last decades, a number 
of full longitudinal data analysis methods have become available, such 
as the linear mixed model for Gaussian outcomes, that are valid un- 
der the much weaker missing at random (MAR) assumption. Such a 
J2 ' method is useful, even if the scientific question is in terms of a sin- 

gle time point, for example, the last planned measurement occasion, 
and it is generally consistent with the intention-to-treat principle. The 
validity of such a method rests on the use of maximum likelihood, un- 
, der which the missing data mechanism is ignorable as soon as it is 

MAR. In this paper, we will focus on non-Gaussian outcomes, such as 
^j- ■ binary, categorical or count data. This setting is less straightforward 

I since there is no unambiguous counterpart to the linear mixed model. 

' We first provide an overview of the various modeling frameworks for 

. non-Gaussian longitudinal data, and subsequently focus on generalized 

linear mixed-effects models, on the one hand, of which the parameters 
can be estimated using full likelihood, and on generalized estimating 

a equations, on the other hand, which is a nonlikelihood method and 
hence requires a modification to be valid under MAR. We briefly com- 
. ment on the position of models that assume missingness not at random 

^ I and argue they are most useful to perform sensitivity analysis. Our 

\^ • developments are underscored using data from two studies. While the 
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case studies feature binary outcomes, the methodology applies equally 
well to other discrete-data settings, hence the qualifier "discrete" in the 
title. 

Key words and phrases: Complete case analysis, ignorability, gener- 
alized estimating equations, generalized linear mixed models, last ob- 
servation carried forward, missing at random, missing completely at 
random, missing not at random, sensitivity analysis. 
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1. INTRODUCTION 

Data from longitudinal studies, in general, and 
from clinical trials, in particular, are prone to incom- 
pleteness. Dropout is a special case of incomplete- 
ness. Since incompleteness usually occurs for reasons 
outside the control of the investigators and may be 
related to the outcome measurement of interest, it is 
generally necessary to address the process that gov- 
erns incompleteness. Only in special but important 
cases is it possible to ignore the missingness process. 

When referring to the missing- value, or nonre- 
sponse, process, we will use the terminology of Little 
and Rubin (2002, Chapter 6). A nonresponse process 
is said to be missing completely at random (MCAR) 
if the missingness is independent of both unobserved 
and observed data, and said to be missing at ran- 
dom (MAR) if, conditional on the observed data, 
the missingness is independent of the unobserved 
measurements. A process that is neither MCAR nor 
MAR is termed nonrandom (MNAR). In the con- 
text of likelihood inference, and when the parame- 
ters that describe the measurement process are func- 
tionally independent of the parameters that describe 
the missingness process, MCAR and MAR are ig- 
norable, while a nonrandom process is nonignorable. 

Early work regarding missingness focused on the 
consequences of the induced lack of balance of devia- 
tions from the study design (Afifi and Elashoff, 1966; 
Hartley and Hocking, 1971). Later, algorithmic de- 
velopments took place, such as the expectation-maxi- 
mization algorithm (EM; Dempster, Laird and Ru- 
bin, 1977) and multiple imputation (Rubin, 1987). 
These advances have brought likelihood-based ig- 
norable analysis within reach for a large class of 
designs and models. However, they usually require 
extra programming in addition to available standard 
statistical software. 
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In the meantime, however, clinical trial practice 
has put a strong emphasis on such methods as com- 
plete case analysis (CC), which restricts the anal- 
ysis to those subjects for which all information has 
been measured according to protocol, and last obser- 
vation carried forward (LOCF), for which the last 
observed measurement is substituted for values at 
later points in time that are not observed, or other 
simple forms of imputation. Claimed advantages in- 
clude computational simplicity, no need for a full 
longitudinal model (e.g., when the scientific ques- 
tion is in terms of the last planned measurement oc- 
casion only) and, for LOCF, compatibility with the 
intention-to-treat (ITT) principle. Within the Gaus- 
sian setting, Molenberghs et al. (2004) have argued 
that this focus is understandable, but, given current 
computational resources, unfortunate. They suggest 
the use of a likelihood-based ignorable analysis, for 
example, based on the linear mixed-effects model. 
Such a method requires MAR rather than the much 
stronger assumptions that underlie CC and LOCF, 
and uses all data, obviating the need for both delet- 
ing and filling in data, and is thus consistent with 
the intention-to-treat principle. Nevertheless, care 
has to be taken when subjects are discontinued for 
reasons of noncompliance, since then the modes of 
analysis indicated here would assume that treatment 
is unchanged after dropout. This implies the need 
for sensitivity analysis and an important discussion 
of this point has been given by Fitzmaurice (2003). 
Molenberghs et al. (2004) also show that the incom- 
plete sequences contribute to estimands of interest, 
even early dropouts when scientific interest is in the 
last planned measurement only. Finally, they show 
that such an analysis is possible, without the need 
for any additional data manipulation, using, for ex- 
ample, the SAS procedure MIXED or the SPlus or 
R function Ime. Of course, a longitudinal model has 
to be specified for the entire vector of responses. In 
a clinical trial setting, with relatively short and bal- 
anced response sequences, full multivariate models, 
encompassing full treatment-by- group interactions, 
perhaps corrected for baseline covariates, and an un- 
structured variance-covariance matrix, are usually 
within reach. A model of this type is relatively mild 
in the restrictions made. 

The non-Gaussian setting is different in the sense 
that there is no generally accepted counterpart to 
the linear mixed-effects model. We therefore first 
sketch a general taxonomy for longitudinal models in 
this context, including marginal, random-effects (or 
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subject-specific) and conditional models. We then 
argue that marginal and random-effects models both 
have their merits in the analysis of longitudinal clin- 
ical trial data and we focus on two important repre- 
sentatives, that is, the generalized estimating equa- 
tions (GEE) approach within the marginal family 
and the generalized linear mixed-effects model 
(GLMM) within the random-effects family. We high- 
light important similarities and differences between 
these model families. While GLMM parameters can 
be fitted using maximum likelihood, the same is not 
true for the GEE method, which is of a frequen- 
tist nature. Therefore, Robins, Rotnitzky and Zhao 
(1995) have devised so-called weighted generalized 
estimating equations (WGEE), which are valid un- 
der MAR but require the specification of a dropout 
model in terms of observed outcomes and / or covari- 
ates, in view of specifying the weights. Thus, we be- 
lieve that, generally, methods such as complete case 
analysis or LOCF ought to be abandoned in favor 
of the likelihood-based and weighted GEE models 
discussed here. 

By definition, MNAR missingness cannot be fully 
ruled out based on the observed data. Nevertheless, 
ignorable analyses may provide reasonably stable re- 
sults, even when the assumption of MAR is violated, 
in the sense that such analyses constrain the behav- 
ior of the unseen data to be similar to that of the 
observed data. A discussion of this phenomenon in 
the survey context has been given in Rubin, Stern 
and Vehovar (1995). These authors first argue that, 
in well-conducted experiments (some surveys and 
many confirmatory clinical trials), the assumption 
of MAR is often to be regarded as a realistic one. 
Second, and very important for confirmatory trials, 
an MAR analysis can be specified a priori without 
additional work relative to a situation with complete 
data. Third, while MNAR models are more general 
and explicitly incorporate the dropout mechanism, 
the inferences they produce are typically highly de- 
pendent on the untestable and often implicit built-in 
assumptions regarding the distribution of the unob- 
served measurements given the observed ones. The 
quality of the fit to the observed data need not re- 
flect at all the appropriateness of the implied struc- 
ture that governs the unobserved data. Based on 
these considerations, we recommend, for primary 
analysis purposes, the use of ignorable likelihood- 
based methods or appropriately modified frequentist 
methods. To explore the impact of deviations from 
the MAR assumption on the conclusions, one should 



ideally conduct a sensitivity analysis (Verb eke and 
Molenberghs, 2000, Chapters 18-20). 

Two case studies motivate our work. The first 
one arises from a randomized, double-blind psychi- 
atric clinical trial conducted in the United States. 
The primary objective of this trial was to compare 
the efficacy of an experimental anti-depressant with 
placebo in order to support a new drug application. 
The study enrolled 167 patients. The Hamilton de- 
pression rating scale (HAMD17) is used to measure 
the depression status of the patients. The binary in- 
dicator of interest is 1 if the H AMD 17 score is greater 
than 7, and otherwise. For each patient, a baseline 
assessment is available, as well as eight post-baseline 
visits going from visit 4 to visit 11. 

The second case study arises from a randomized 
multicentric clinical trial that compared an experi- 
mental treatment (interferon-o) to a corresponding 
placebo in the treatment of patients with age-related 
macular degeneration. Interest focuses on the com- 
parison between placebo and the highest dose (6 mil- 
lion units daily) of interferon-a (Z), but the full 
results of this trial have been reported elsewhere 
(Pharmacological Therapy for Macular Degenera- 
tion Study Group, 1997). Patients with macular de- 
generation progressively lose vision. In the trial, the 
patients' visual acuity was assessed at different time 
points (4 weeks, 12 weeks, 24 weeks and 52 weeks) 
through their ability to read lines of letters on stan- 
dardized vision charts. These charts display lines of 
five letters of decreasing size, which the patient must 
read from top (largest letters) to bottom (smallest 
letters). Each line with at least four letters correctly 
read is called one line of vision. The patient's visual 
acuity is the total number of letters correctly read. 
The primary endpoint of the trial was the loss of 
at least three lines of vision at 1 year, compared to 
their baseline performance (i.e., a binary endpoint). 
The total number of longitudinal profiles is 240, but 
only for 188 of these have the four follow-up mea- 
surements been made. An overview is given in Ta- 
ble 1. Thus, 78.33% of the profiles are complete, 
while 18.33% exhibit monotone missingness. Out of 
the latter group, 2.5% or six subjects have no follow- 
up measurements. The remaining 3.33%, represent- 
ing eight subjects, have intermittent missing values. 
Although the group of dropouts is of considerable 
magnitude, the ones with intermittent missingness 
is much smaller. Nevertheless, it is cautious to in- 
clude all data in the analyses. Data on this second 
case study are available on the author's website. 
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Table 1 

Age related macular degeneration trial: Overview of 
misaingness patterns and the frequencies with which they 
occur (O indicates observed and M indicates missing) 



Measurement occasion (weeks) 



4 


12 


24 


52 


Number 


% 








Completers 









O 


O 


O 


188 


78.33 








Dropouts 












O 


M 


24 


10.00 








M 


M 


8 


3.33 





M 


M 


M 


6 


2.50 


M 


M 


M 


M 


6 


2.50 






Nonmonotone missingness 










M 


O 


4 


1.67 





M 


M 





1 


0.42 


M 











2 


0.83 


M 





M 


M 


1 


0.42 



The general data setting is introduced in Section 2, 
as well as a formal framework for incomplete longi- 
tudinal data, together with a discussion of the prob- 
lems associated with simple methods. Section 3 fo- 
cuses on two important families of models for dis- 
crete repeated measures. The first case study is an- 
alyzed in Section 4, while the second one is the sub- 
ject of Section 5. The paper ends with a discussion 
in Section 6. 

2. DATA SETTING AND MODELING 
FRAMEWORK 

Assume that for subject i = 1, . . . , a sequence 
of responses Yij is designed to be measured at oc- 
casions j = 1, . . . , n. The outcomes are grouped into 
a vector Yj = (1^1, . . . , li,„,)'. In addition, for each 
occasion j, define Rij as being equal to 1 if Yij is 
observed and otherwise. The missing data indica- 
tors Rij are grouped into a vector Rj, which is of 
the same length as Yj . Define now a dropout indica- 
tor Di for the occasion at which dropout occurs and 
make the convention that Di = n + 1 for a complete 
sequence. Further, split the vector Yj into observed 
(Y°) and missing (Y,-") components, respectively. 

Modeling usually is initiated by considering the 
full data density f{yi,di\9,ip), where the parame- 
ter vectors 6 and i/? describe the measurement and 
missingness processes, respectively. Covariates are 
assumed to be measured, but for notational simplic- 
ity are suppressed from notation. 



Most strategies used to analyze such data are, 
implicitly or explicitly, based on the following two 
choices. 

Model for measurements. A choice has to be made 
regarding the modeling approach to the measure- 
ments. There are three common views. In the first 
view, one opts to analyze the entire longitudinal pro- 
file, irrespective of whether interest focuses on the 
entire profile, on the one hand (e.g., difference in 
slope between groups), or whether a specific time is 
of interest, on the other hand (e.g., the last planned 
occasion). In the latter case, the motivation to model 
the entire profile is because, for example, earlier re- 
sponses do provide statistical information on later 
ones. This is especially true when dropout is present. 
In the second view, one defines the scientific ques- 
tion and restricts the corresponding analysis to the 
last planned occasion. Of course, as soon as dropout 
occurs, such a measurement may not be available, 
whence often last observation carried forward is used. 
Such an analysis is based on strong assumptions 
that often do not hold. This point has been made 
extensively in Molenberghs et al. (2004). These au- 
thors advocate the use of proper, likelihood-based, 
longitudinal methods, which are generally valid un- 
der MAR. In the third view, one chooses to define 
the question and the corresponding analysis in terms 
of the last observed measurement. While sometimes 
used as an alternative motivation for so-called last 
observation carried forward analyses (Siddiqui and 
Ali, 1998; Mallinckrodt, Clark, Carroll and Molenberghs, 
2003a; Mallinckrodt et al., 2003b), a common crit- 
icism is that the last observed measurement amal- 
gamates measurements at real stopping times (for 
dropouts) and at a purely design-based time (for 
completers). Thus, we hope to show that the first 
view is in many situations the most sensible route 
of analysis. 

Method for handling missingness. A choice has to 
be made regarding the modeling approach to the 
missingness model. Luckily, under certain assump- 
tions this process can be ignored (likelihood-based 
or Bayesian ignorable analysis, for which MAR is a 
sufficient condition). Some simple methods such as 
CC analysis and LOCF do not explicitly address the 
missingness mechanism either, but are nevertheless 
not ignorable. We will return to this issue in the 
next section. 

Let us first describe the measurement and miss- 
ingness models in turn, and then introduce and com- 
ment on ignor ability. The measurement model will 
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depend on whether or not a fuh longitudinal analysis 
is done. In case the second or third view is adopted, 
one typically opts for classical two-group or multi- 
group comparisons (t test, Wilcoxon, etc.). In case a 
longitudinal analysis is deemed necessary, the choice 
made depends on the nature of the outcome. 

For continuous outcomes, a common choice is the 
general linear mixed-effects model or a special case 
of it, such as a (structured) multivariate normal 
model (Verbeke and Molenberghs, 2000). However, 
for categorical (nominal, ordinal and binary) and 
discrete outcomes (counts), as in our case study, 
the modeling choices are less straightforward. Ex- 
tensions of the generalized linear models to the lon- 
gitudinal case were discussed by Diggle, Heagerty, 
Liang and Zeger (2002), where a lot of emphasis is on 
generalized estimating equations (Liang and Zeger, 
1986). Generalized linear mixed models have been 
proposed and/or studied by, for example, Stiratelli, 
Laird and Ware (1984), Wolfinger and O'Connell 
(1993) and Breslow and Clayton (1993). Fahrmeir 
and Tutz (2001) devoted an entire book to general- 
ized linear models for multivariate settings. We re- 
turn to modeling non-Gaussian repeated measures 
in Section 3. It is important to note that, since quite 
distinct modeling families are in use, the researcher 
ought to be guided by the main scientific question at 
hand when choosing between the modeling families. 

Assume that incompleteness is due to dropout 
only and that the first measurement Yn is obtained 
for everyone. The model for the dropout process is 
based on, for example, a logistic regression for the 
probability of dropout at occasion j, given the sub- 
ject is still in the study. We denote this probability 
by g{hij,yij), in which hij is a vector that contains 
all responses observed up to but not including occa- 
sion j, as well as relevant covariates. We then assume 
that g{hij,yij) satisfies 

logit[g (hij,yij)] 

(1) =logit[pr(A=j|A>J,yi)] 

= hij'ijj + ujyij, i=l,...,N. 

When uj equals zero, and assuming the posited model 
is correct, the dropout model is MAR. If uj ^0, the 
posited dropout process is MNAR. Model (1) pro- 
vides the building blocks for the dropout process 
f{di\yi, ijj). 

Rubin (1976) and Little and Rubin (2002) have 
shown that, under MAR and mild regularity condi- 
tions (parameters 6 and i/' are functionally indepen- 
dent), likelihood-based and Bayesian inference are 



valid when the missing data mechanism is ignored 
(see also Verbeke and Molenberghs, 2000). Prac- 
tically speaking, the likelihood of interest is then 
based on the factor /(y°|0). This is called ignorahil- 
ity. A model of the form (1), of course with u = 0, 
may but does not have to be considered in such a 
case. The practical implication is that a software 
module with likelihood estimation facilities and with 
the ability to deal with incompletely observed sub- 
jects manipulates the correct likelihood, providing 
valid parameter estimates, standard errors if based 
on the observed information matrix and likelihood 
ratio values (Kenward and Molenberghs, 1998). Ex- 
amples of such software tools include the MIXED, 
NLMIXED and GENMOD procedures in SAS. 

A few cautionary remarks are in order. First, when 
at least part of the scientific interest is directed to- 
ward the nonresponse process, obviously both pro- 
cesses need to be considered. Still, under MAR, both 
processes can be modeled and parameters can be es- 
timated separately. Second, it may be hard to fully 
rule out the operation of an MNAR mechanism. 
Third, one is now restricted to the first view on mod- 
eling the outcomes, that is, a full longitudinal anal- 
ysis is necessary, even when interest is restricted to, 
for example, a comparison between the two treat- 
ment groups at the last occasion. In the latter case, 
the fitted model can be used as the basis for in- 
ference at the last occasion. A common criticism, 
especially in a regulated controlled clinical trial set- 
ting, is that a model needs to be considered. How- 
ever, it should be noted that in many clinical trial 
settings the repeated measures are balanced in the 
sense that a common (and often limited) set of mea- 
surement times is considered for all subjects, allow- 
ing the a priori (protocol) specification of a sat- 
urated model (e.g., full group- by-time interaction 
model for the means and unstructured variance- 
covariance matrix). 

Such an ignorable linear mixed model specifica- 
tion is termed mixed-effects model repeated mea- 
sures (MMRM) by Malhnckrodt, Clark and David 
(2001a, b). Thus, MMRM is a particular form of a 
linear mixed model, relevant for acute phase con- 
firmatory clinical trials, fitting within the ignorable 
likelihood paradigm. It has to be noted that this ap- 
proach, for the special case where no dropout occurs, 
is fully equivalent to a one-way multivariate analysis 
of variance model for the repeated outcomes, with 
a class variable treatment effect. This observation 
provides a strong basis for such an approach, which 
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is a very promising alternative for the simple ad hoc 
methods such as complete-case analysis or LOCF. 
While the above reasoning is tied to the continuous- 
outcome setting, similar modeling strategies exist 
for the non-Gaussian case, as discussed in Section 3. 

These arguments, supplemented with the avail- 
ability of software tools within which such multi- 
variate models can be fitted to incomplete data, cast 
doubts regarding the usefulness of such simple meth- 
ods as CC and LOCF. This issue has been discussed 
in detail, in the context of Gaussian outcomes, by 
Molenberghs et al. (2004). Apart from biases as soon 
as the missing data mechanism is not MCAR, CC 
can suffer from severe efficiency losses. Especially 
since tools have become available to include incom- 
plete sequences along with complete ones into the 
analysis, one should do everything possible to avoid 
wasting patient data. 

Last observation carried forward, as other impu- 
tation strategies (Dempster and Rubin, 1983; Little 
and Rubin, 2002), can lead to artificially infiated 
precision. Furthermore as Molenberghs et al. (2004) 
have shown, the method can produce severely bi- 
ased treatment comparisons and, perhaps contrary 
to some common belief, such biases can be conserva- 
tive but also liberal. The method rests on the strong 
assumption that a patient's outcome profile remains 
fiat, at the level of the last observed measurement, 
throughout the remainder of follow-up. 

3. DISCRETE REPEATED MEASURES 

Whereas the linear mixed model and its special 
cases is seen as a unifying parametric framework for 
Gaussian repeated measures (Verbeke and 
Molenberghs, 2000), there are many more options 
available in the non-Gaussian setting. In a marginal 
model, marginal distributions are used to describe 
the outcome vector Y, given a set X of predic- 
tor variables. The correlation among the compo- 
nents of Y can then be captured either by adopting 
a fully parametric approach or by means of work- 
ing assumptions, such as in the semiparametric ap- 
proach of Liang and Zeger (1986). Alternatively, in 
a random- effects model, the predictor variables X 
are supplemented with a vector 6 of random effects, 
conditional upon which the components of Y are 
usually assumed to be independent. This does not 
preclude that more elaborate models are possible if 
residual dependence is detected (Longford, 1993). 
Finally, a conditional model describes the distribu- 
tion of the components of Y, conditional on X but 



also conditional on (a subset of) the other compo- 
nents of Y. Well-known members of this class of 
models are log-linear models (Gilula and Haberman, 
1994). 

Let us give a simple example of each for the case 
of Gaussian outcomes. A marginal model starts by 
specifying 

(2) E(y,,|x,,)=x^/3, 

whereas in a random-effects model we focus on the 
expectation, conditional upon the random-effects vec- 
tor, 

(3) i?(yi,-|b„x,,-)=x^/3 + z^b,. 

The conditional model uses expectations of the form 

(4) E{Yij\Yi^j_i, . . .,Yii,^ij) = ^[jf3 + aVij-i. 

In the linear mixed model case, random-effects mod- 
els imply a simple marginal model. This is due to the 
elegant properties of the multivariate normal dis- 
tribution. In particular, the expectation (2) follows 
from (3) either by (a) marginalizing over the random 
effects or by (b) conditioning on the random-effects 
vector bj = 0. Hence, the fixed-effects parameters /3 
have both a marginal as well as a hierarchical model 
interpretation. 

Since marginal and random-effects models are the 
most useful ones in our context and given this con- 
nection between them, it is clear why the linear 
mixed model provides a unified framework in the 
Gaussian setting. Such a close connection between 
the model families does not exist when outcomes 
are of a nonnormal type, such as binary, categor- 
ical or discrete. We will consider the marginal and 
random-effects model families in turn and then point 
to some particular issues that arise within them or 
when comparisons are made between them. The con- 
ditional models are less useful in the context of longi- 
tudinal data and will not be discussed here (Molen- 
berghs and Verbeke, 2005). 

3.1 Marginal Models 

Thorough discussions on marginal modeling can 
be found in Diggle, Heagerty, Liang and Zeger (2002) 
and Fahrmeir and Tutz (2001). The specific context 
of clustered binary data has received treatment in 
Aerts, Geys, Molenberghs and Ryan (2002). Apart 
from full likelihood approaches, nonlikelihood ap- 
proaches such as generalized estimating equations 
(Liang and Zeger, 1986) or pseudolikelihood (le Cessie 
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and van Houwelingen, 1994; Geys, Molenberghs and 
Lipsitz, 1998) have been considered. 

Bahadur (1961) proposed a marginal model, ac- 
counting for the association via marginal correla- 
tions. Ekholm (1991) proposed a so-called success 
probabilities approach. Bowman and George (1995) 
proposed a model for the particular case of exchange- 
able binary data. Ashford and Sowden (1970) con- 
sidered the multivariate probit model for repeated 
ordinal data, thereby extending univariate probit re- 
gression. Molenberghs and Lesaffre (1994) and Lang 
and Agresti (1994) proposed models that parame- 
terize the association in terms of marginal odds ra- 
tios. Dale (1986) defined the bivariate global odds 
ratio model, based on a bivariate Plackett distri- 
bution (Plackett, 1965). Molenberghs and Lesaffre 
(1994, 1999), Lang and Agresti (1994) and Glonek 
and McCullagh (1995) extended this model to mul- 
tivariate ordinal outcomes. They generalized the bi- 
variate Plackett distribution in order to establish the 
multivariate cell probabilities. 

While full likelihood methods are appealing be- 
cause of their flexible ignorability properties (Sec- 
tion 2), their use for non-Gaussian outcomes can 
be problematic due to prohibitive computational re- 
quirements. Therefore, GEE is a viable alternative 
within this family. Since GEE is motivated by fre- 
quentist considerations, the missing data mechanism 
needs to be MCAR for it to be ignorable. This moti- 
vates the proposal of so-called weighted generalized 
estimating equations. We will discuss these in turn. 

3.1.1 Generalized estimating equations. General- 
ized estimating equations, useful to circumvent the 
computational complexity of full likelihood, can be 
considered whenever interest is restricted to the mean 
parameters (treatment difference, time evolutions, 
effect of baseline covariates, etc.). It is rooted in 
the quasi-likelihood ideas expressed by McCullagh 
and Nelder (1989). Modeling is restricted to the cor- 
rect specification of the marginal mean function, to- 
gether with so-called working assumptions about the 
correlation structure of the vector of repeated mea- 
sures. 

Let us now introduce the classical form of GEE. 
Note that the score equations to be solved when 
computing maximum likelihood estimates under a 
marginal normal model ~ N(Xif3, Vi) are given 
by 

N 

(5) J2^liAy'aAyY\y, - X,P) = 0, 

i=l 



in which the marginal covariance matrix Vi has been 

1/2 1/2 

decomposed in the form A- CiA- , where Ai is 
the matrix with the marginal variances on the main 
diagonal and zeros elsewhere, and Ci is equal to the 
marginal correlation matrix. Switching to the non- 
Gaussian case, the score equations become 

(6) s{p) = Y: ^{Ay'c.AyY'iy^ - ^^^) = 0, 

which are less linear than (5) due to the presence of a 
link function (e.g., the logit link for binary data) and 
the mean-variance relationship. Typically the cor- 
relation matrix Ci contains a vector o; of unknown 
parameters that is replaced for practical purposes 
by a consistent estimate. 

Assuming that the marginal mean /^j has been 
correctly specified as /i(/^j) = Xif3, it can be shown 
that, under mild regularity conditions, the estimator 
f3 obtained by solving (6) is asymptotically normally 
distributed with mean (3 and with covariance matrix 

(7) Io^hIo\ 
where 

In practice, Var(yj) in (7) is replaced by (y,/ — /zj • 
(yi — Mi)') which is unbiased on the sole condition 
of correct mean specification. One also needs es- 
timates of the nuisance parameters o;. Liang and 
Zeger (1986) proposed moment-based estimates for 
the working correlation. To this end, define devia- 
tions 

eij — I 

Some of the more popular choices for the work- 
ing correlations are independence [Coicic{Yij,Yik) = 
0,j / k], exchangeability [Corr {Yij,Yik) = a,j j^k], 
AR(1) [CoTT{Yij,Yij+t) = a\t = 0,l,...,ni-j] and 
unstructured [Corr {Yij,Yik) = ajk,j / A;]. Typically, 
moment-based estimation methods are used to esti- 
mate these parameters as part of an integrated it- 
erative estimation procedure. An over dispersion pa- 
rameter could be included as well, but we have sup- 
pressed it for ease of exposition. The standard it- 
erative procedure to fit GEE, based on Liang and 
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Zeger (1986), is then as follows: (1) compute initial 
estimates for (3 using a univariate generalized linear 
model (i.e., assuming independence); (2) compute 
the quantities bj needed in the estimating equa- 
tion; (3) compute Pearson residuals Cjj; (4) compute 
estimates for a; (5) compute Ci{oL)] (6) compute 



V,{(3,cx) = A]'"^ {(3)Ci{cx)A\' \I3); (7) update the es 
timate for j3: 



1/2, 



it) 



■ N 

E 

.1=1 



d(3 ^ d(3 



Steps 2-7 are iterated until convergence. To illus- 
trate step 4, consider compound symmetry, in which 
case the correlation is estimated by 



1 



N 



a 



-T- 



1 



\ niijii 



1 \ E ^ij^ik- 



3.1.2 Weighted generalized estimating equations. 
As Liang and Zeger (1986) pointed out, GEE-based 
inferences are valid only under MCAR, due to the 
fact that they are based on frequentist considera- 
tions. An important exception mentioned by these 
authors is the situation where the working corre- 
lation structure (discussed in the previous section) 
happens to be correct, since then the estimates and 
model-based standard errors are valid under the 
weaker MAR. This is because then the estimating 
equations can be interpreted as likelihood equations. 
In general, of course, the working correlation struc- 
ture will not be correctly specified. The ability to do 
so is the core motivation of the method, and there- 
fore Robins, Rotnitzky and Zhao (1995) proposed a 
class of weighted estimating equations to allow for 
MAR, extending GEE. 

The idea is to weight each subject's contribution 
in the GEEs by the inverse probability that a subject 
drops out at the time he dropped out. This can be 
calculated, for example, as 



i^idi =P[Di = di 



di-l 

: Y[{l-P[R,k = 0\Ri2 

k=2 



R 



■i,fc— 1 



1]) 



iI{d,<T} 



.p[Ria,=0\R,2 = --- = RLd,^i = lY 

Recall that we partitioned Yj into the unobserved 
components Y[" and the observed components Y°. 



Similarly, we can make the exact same partition of 
//j into /2™ and n°. In the WGEE approach, which 
is proposed to reduce possible bias of f3, the score 
equations to be solved when taking into account the 
correlation structure are 



^^d, 9p 



(8) 



N n+1 

EE 

i=l d=2 



I{Di = d) 



fid 

■(rf)(y(<i)-ft(d)) = 0, 

where yi{d) and Hi{d) are the first d—1 elements of 
yj and /Xj, respectively. We define d^i^/ d(3' {d) and 

{A^J'^CiA^J'^)~^{d) analogously. 

It is worthwhile to note that the recently pro- 
posed so-called doubly robust method (van der Laan 
and Robins, 2003) is more efficient and robust to a 
wider class of deviations. However, it is harder to im- 
plement than the original proposal. An alternative 
mode of analysis, generally overlooked but proposed 
by Schafer (2003), consists of multiply imputing the 
missing outcomes using a parametric model (e.g., of 
a random-effects or conditional type), followed by 
conventional GEE and conventional multiple-imputation 
inference on the so-completed sets of data. 

3.2 Random- Effects Models 

Unlike for correlated Gaussian outcomes, the pa- 
rameters of the random-effects and population-averaged 
models for correlated binary data describe different 
types of effects of the covariates on the response 
probabilities (Neuhaus, 1992). Therefore, the choice 
between population-averaged and random-effects stra- 
tegies should heavily depend on the scientific goals. 
Population-averaged models evaluate the success 
probability as a function of covariates only. With 
a subject-specific approach, the response is mod- 
eled as a function of covariates and parameters, spe- 
cific to the subject. In such models, interpretation of 
fixed-effects parameters is conditional on a constant 
level of the random-effects parameter. Population- 
averaged comparisons, on the other hand, make no 
use of within cluster comparisons for cluster vary- 
ing covariates and are therefore not useful to assess 
within-subject effects (Neuhaus, Kalbfleisch and 
Hauck, 1991). While several nonequivalent random- 
effects models exist, one of the most popular ones 
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is the generalized linear mixed model (Breslow and 
Clayton, 1993), implemented in the SAS procedure 
NLMIXED. We will focus on this one. 

3.2.1 Generalized linear mixed models. A general 
formulation of mixed-effects models is as follows. As- 
sume that Yj (possibly appropriately transformed) 
satisfies 



(9) 



Yi|b,~Fi(6>,bi 



that is, conditional on bj, Yj follows a prespeci- 
fied distribution Fj, possibly depending on covari- 
ates and parameterized through a vector of un- 
known parameters common to all subjects. Further- 
more hi is a (/-dimensional vector of subject-specific 
parameters, called random effects, assumed to fol- 
low a so-called mixing distribution G which may 
depend on a vector if) of unknown parameters [i.e., 
hi G{tp)]. The bj reflect the between-unit hetero- 
geneity in the population with respect to the dis- 
tribution of Yj. In the presence of random effects, 
conditional independence is often assumed, under 
which the components Yij in Yj are independent, 
conditional on bj. The distribution function Fi in 
(9) then becomes a product over the Tij independent 
elements in Yj. 

In general, unless a fully Bayesian approach is fol- 
lowed, inference is based on the marginal model for 
Yj which is obtained by integrating out the random 
effects over their distribution G{ip). Let /j(yj|bj) 
and g{hi) denote the density functions that corre- 
spond to the distributions Fi and G, respectively. 
We have that the marginal density function of Yj 
equals 



(10) 



fi{yi 



/i(yi|bi)s'(bi)(ibj 



which depends on the unknown parameters 6 and ip. 
Assuming independence of the units, estimates of 
6 and i/' can be obtained by maximizing the like- 
lihood function built from (10), and inferences im- 
mediately follow from classical maximum likelihood 
theory. 

It is important to realize that the random-effects 
distribution G is crucial in the calculation of the 
marginal model (10). One often assumes G to be of 
a specific parametric form, such (multivariate) 
normal. Depending on Fj and G, the integration in 
(10) may or may not be possible analytically. Pro- 
posed solutions are based on Taylor series expan- 
sions of /j(yj|bj) or on numerical approximations of 



the integral, such as (adaptive) Gaussian quadra- 
ture. 

Note that there is an important difference with 
respect to the interpretation of the fixed effects (3. 
Under the classical linear mixed model (Verbeke and 
Molenberghs, 2000), we have that -E'(Yi) equals XiP, 
such that the fixed effects have a subject-specific 
as well as a population-averaged interpretation. Un- 
der nonlinear mixed models, however, this no longer 
holds in general. The fixed effects now only refiect 
the conditional effect of covariates, and the marginal 
effect is no longer easily obtained as E(Yi) is given 
by 



Yi / /j(yi|bj)5f(bj)dbj(iyj. 



F(Yi 



However, in a biopharmaceutical context, one is of- 
ten primarily interested in hypothesis testing and 
the random-effects framework can be used to this 
effect. 

A general formulation of GLMM is as follows. 
Conditionally on random effects bj, it assumes that 
the elements Yij of Yj are independent, with den- 
sity function usually based on a classical exponential 
family formulation, that is, with mean E(Yij\hi) = 
a'{rjij) = /Lijj(bj) and variance Var(yjj |bj) = (l)a"{r]ij), 
and where, apart from a link function h (e.g., the 
logit link for binary data or the Poisson link for 
counts), a linear regression model with parameters 
P and bj is used for the mean [i.e., /i(/Xj(bj)) = 
Xif3 + Zjbj]. Note that the linear mixed model is 
a special case with identity link function. The ran- 
dom effects bj are again assumed to be sampled 
from a (multivariate) normal distribution with mean 
and covariance matrix D. Usually, the canonical 
link function is used, that is, h = a'~^, such that 
rji = XiP + Zihi. When the link function is cho- 
sen to be of the logit form and the random effects 
are assumed to be normally distributed, the familiar 
logistic- linear GLMM follows. 

3.3 Marginal versus Random- Effects Models 

It is useful to underscore the difference between 
both model families, as well as the nature of this dif- 
ference. To see the nature of the difference, consider 
a binary outcome variable and assume a random- 
intercept logistic model with linear predictor 
logit[P(ljj = l\tij,bi)] = Po + bi + Pitij, where Uj is 
the time covariate. The conditional means E(Yij\bi), 
as functions of tij, are given by 

exp(/3o + bi+Pitij) 



[11) E{Y,j\bi) 



1 + exp(/3o + bi + f3iti 
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whereas the marginal average evolution is obtained 
by averaging over the random effects, 



EiYi 



(12) 



expiPo + bj + Pitjj) 



E 



. 1 + exp(/3o + 6i + 
exp(/?o + /3itjj) 



1 + exp(/3o + ii\ti 



A graphical representation of both (11) and (12) is 
given in Figure 1. This implies that the interpreta- 
tion of the parameters in both types of models is 
completely different. A schematic display is given in 
Figure 2. Depending on the model family (marginal 
or random effects), one is led to either marginal or 
hierarchical inference. It is important to realize that 
in the general case the parameter that results 
from a marginal model is different from the param- 
eter /3^^ even when the latter is estimated using 
marginal inference. Some of the confusion surround- 
ing this issue may result from the equality of these 
parameters in the very special linear mixed model 
case. When a random-effects model is considered, 
the marginal mean profile can be derived, but it will 
generally not produce a simple parametric form. In 
Figure 2 this is indicated by putting the correspond- 
ing parameter within quotes. 

As an important example, consider our GLMM 
with logit link function, where the only random ef- 
fects are intercepts hi. It can then be shown that the 
marginal mean //j = Ei^ij) satisfies h(fj,^) ~ Xif3^ 
with 



(13) 



/3 



RE 



M 



n/cV^ + 1>1, 



in which c equals 16\/3/157r. Hence, although the 
parameters /3^^ in the generalized linear mixed model 
have no marginal interpretation, they do show a 
strong relationship to their marginal counterparts. 
Note that, as a consequence of this relationship, 
larger covariate effects are obtained under the random- 
effects model in comparison to the marginal model. 

4. ANALYSIS OF FIRST CASE STUDY 

Let us now analyze the motivating clinical trial. 
Therapies are recorded as Al for primary dose of ex- 
perimental drug, while B refers to nonexperimental 
drug and C refers to placebo. The primary contrast 
is between Al and C. Emphasis is on the difference 
between arms at the end of the study. A graphical 



representation of the dropout, per study and per 
arm, is given in Figure 3. 

The primary null hypothesis (zero difference be- 
tween the treatments and placebo in terms of pro- 
portion of the HAMDiy total score above the level 
of 7) will be tested using both marginal models (GEE 
and WGEE) and random-effects models (GLMM). 
According to the study protocol, the models will 
include the fixed categorical effects of treatment, 
visit and treatment-by-visit interaction, as well as 
the continuous, fixed covariates of baseline score and 
baseline score-by-visit interaction. A random inter- 
cept will be included when considering the random- 
effects models. Analyses will be implemented using 
the SAS procedures GENMOD and NLMIXED. 

Missing data will be handled in three different 
ways: (1) imputation using LOCF, (2) deletion of 
incomplete profiles, leading to a CC, and (3) ana- 
lyzing the data as they are, consistent with ignor- 
ability (for GLMM and WGEE). A fully longitudi- 
nal approach (View 1) is considered in Section 4.1. 
Section 4.2 compares the results of the marginal 
and random-effects models. Section 4.3 focuses on 
Views 2 (treatment effect at last planned occasion) 
and 3 (last measurement obtained). 

4.1 View 1: Longitudinal Analysis 

4.1.1 Marginal models. First, let us consider the 
GEE approach. Within the SAS procedure GEN- 
MOD, the exchangeable working correlation matrix 
is used. 

An inspection of parameter estimates and stan- 
dard errors (not shown) reveals that the interaction 
between treatment and time is nonsignificant. The 
same holds in the analyses that will be done subse- 
quently. At first sight, this suggests model simplifi- 
cation. However, there are a few reasons to prefer 
a different route. First, as stated before, a longitu- 
dinal model used in a regulatory, controlled envi- 
ronment is ideally sufficiently generally specified to 
avoid driving conclusions through models that are 
too simple. Sticking to a single, prespecified model 
also avoids dangers associated with model selection 
(e.g., infiated type I errors) recently reported in the 
literature (Hjort and Claeskens, 2003). Second, a 
general model allows for, as a by-product, assess- 
ment of treatment effect at the last planned occa- 
sion. Third, one can still assess the important null 
hypothesis of (1) no average treatment effect and 
(2) no treatment effect at any of the measurement 
occasions. These tests have been conducted and are 
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Time 

Fig. 1. Graphical representation of a random-intercept logistic curve across a range of levels of the random intercept, together 
with the corresponding marginal curve. 



reported in Table 2; for (1), also the estimated av- were true likelihood equations, that is, when the 

erage treatment effect is reported. working correlation structure is correct. In such cases 

In many cases, the empirically corrected standard 
errors are larger than the model-based ones. This is 

because model-based standard errors are the ones working correlation structure is allowed to be 

that would be obtained if the estimating equations misspecified, model-based standard errors will be bi- 



likelihood inference enjoys optimality. However, since 
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Fig. 2. Representation of model families and corresponding inference. A superscript M stands for marginal; RE denotes 
random effects. A parameter within quotes indicates that marginal functions but no direct marginal parameters are obtained. 
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ased and it is advisable to base conclusions on em- 
pirically corrected standard errors. 

Turning to WGEE, the method is applied to per- 
form an analysis that is correct under MAR, not 
only under MCAR as in ordinary GEE. This proce- 
dure is a bit more involved in terms of fitting the 
model to the data. We will outline the main steps. 
The SAS code is available from the authors upon 
request. 



To compute the necessary weights, we first fit the 
dropout model using logistic regression. The out- 
come drop is binary and indicates whether or not 
dropout occurs at a given time. The response value 
at the previous occasion (prevhamd) and treatment 
are included as covariates. Next, the predicted prob- 
abilities of dropout are translated into weights, de- 
fined at the individual measurement level. Let us de- 
scribe the procedure to construct the inverse weights. 
At the first occasion define wn = 1. At other than 




Table 2 

Depression trial, View 1: GEE, WGEE and GLMM. Tests for (1) the joint null hypothesis of no treatment effect at none 
of the time points and (2) the hypothesis of no average treatment effect 



Joint effects Mean effects 



Al B Al & Bl Al B Al & Bl 

(8 d.f.) (8 d.f.) (16 d.f.) (1 d.f.) (1 d.f.) (2 d.f.) 

Analysis P P P P (est., s.e.) p (est., s.e.) p 



CC (GEE) 


0.4278 


0.9859 


0.8444 


0.5259 (-2.66;4.19) 


0.9165 (0.47; 4.50) 


0.5845 


LOCF (GEE) 


0.7008 


0.9956 


0.9768 


0.7713 (-1.15; 3.96) 


0.9070 (0.49; 4.19) 


0.8605 


MAR (GEE) 


0.6465 


0.9931 


0.9413 


0.6015 (-1.92; 3.67) 


0.8671 (0.65; 3.89) 


0.6804 


MAR (WGEE) 


0.1690 


0.7601 


0.5372 


0.5477 (2.61; 4.34) 


0.3883 (3.97; 4.60) 


0.7224 


CC (GLMM) 


0.7572 


0.9743 


0.7233 


0.4954 (-0.40; 0.59) 


0.2671 (0.64; 0.57) 


0.0440 


LOCF (GLMM) 


0.7363 


0.9953 


0.9763 


0.1571 (-0.66; 0.47) 


0.4555 (-0.34; 0.45) 


0.3611 


MAR (GLMM) 


0.7476 


0.9738 


0.7152 


0.4495 (-0.41; 0.55) 


0.2844 (0.58; 0.54) 


0.0375 



LONGITUDINAL CLINICAL TRIAL DATA 



13 



the last occasion, the quantity of interest equals the 
cumulative weight over the previous occasions, mul- 
tiplied by (1 — the predicted probability of dropout). 
At the last occasion within a sequence where dropout 
occurs, it is multiplied by the predicted probability 
of dropout. At the end of the process this quan- 
tity is inverted to yield the actual weight. After 
these preparations we merely need to include the 
weights by means of the scwgt statement within the 
GENMOD procedure. Together with the use of the 
repeated statement, WGEE follows. Also here we 
use the exchangeable working correlation matrix. 

Let us now turn to the results. The marginal mod- 
els reveal nonsignificant treatment effects in all cases, 
for either the composite hypothesis of no treatment 
effects or the hypotheses of no average effects. This 
holds for both arms separately, as well as for the 
two arms jointly. Corresponding to the 1-degree-of- 
freedom (d.f.) tests, parameter estimates and stan- 
dard errors can be estimated as well. For concise- 
ness, only empirically corrected standard errors are 
shown. A strong difference is observed between the 
WGEE and other cases. Since this is the only one 
valid under MAR, it is clear that there are dangers 
associated with methods that are too simple. Fur- 
thermore, some of the CC p- values are smaller than 
their MAR and LOCF counterparts. 



4.1.2 Random- effect models. To fit generalized lin- 
ear mixed models, we use the SAS procedure 
NLMIXED, which allows fitting a wide class of lin- 
ear, generalized linear and nonlinear mixed models. 
It relies on numerical integration. Not only are dif- 
ferent integral approximations available, the princi- 
pal ones being (adaptive) Gaussian quadrature, it 
also includes a number of optimization algorithms. 
The difference between nonadaptive and adaptive 
Gaussian quadrature is that for the first procedure 
the quadrature points are centered at zero for each 
of the random effects and the current random-effects 
covariance matrix is used as the scale matrix, while 
for the latter the quadrature points will be appropri- 
ately centered and scaled, such that more quadra- 
ture points lie in the region of interest (Molenberghs 
and Verbeke, 2005). We will use both adaptive and 
nonadaptive quadrature, with several choices for the 
number of quadrature points, to check the stability 
of the results over a variety of choices for these nu- 
merical choices. 

Precisely, we initiate the model fitting using non- 
adaptive Gaussian quadrature, together with the 
quasi-Newton optimization algorithm. The number 
of quadrature points is left to be determined by 
the procedure, and all starting values are set equal 




Fig. 4. The effect of adaptive versus nonadaptive quadrature, quasi-Newton versus Newton-Raphson and the number of 
quadrature points on the treatment effect parameter for arm Al. 
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to 0.5. Using the resulting parameter estimates, we 
keep these choices but hold the number of quadra- 
ture points fixed (2, 3, 5, 10, 20 and 50). Subse- 
quently, we switch to adaptive Gaussian quadra- 
ture (step 2). Finally, the quasi-Newton optimiza- 
tion is replaced by the Newton-Raphson optimiza- 
tion (step 3). The effect of the method and the 
number of quadrature points is graphically repre- 
sented in Figure 4 for a selected parameter (treat- 
ment effect of Al). While the differences between 
these choices are purely numerical, we do notice dif- 
ferences between the results, illustrating that a nu- 
merical sensitivity analysis matters. The parameter 
estimates tend to stabilize with increasing number of 
quadrature points. However, nonadaptive Gaussian 
quadrature needs obviously more quadrature points 
than adaptive Gaussian quadrature. 

Focusing on the results for 50 quadrature points, 
we observe that the parameter estimates for steps 
1 and 2 are the same. On the other hand, parame- 
ter estimates for step 3 are different (order of 10~^, 
visible in p- values) . In spite of the differences in pa- 
rameter estimates, it is noteworthy that the likeli- 
hood is the same in all steps, due to a flat likelihood. 
This was confirmed by running all steps again using 
the parameter estimates of step 3 as starting values, 
at which point the parameter estimates all coincide. 
Thus, it may happen that the optimization routine 
has only seemingly converged. 

Exactly as in the marginal model case, we assessed 
average treatment effect as well as treatment effect 
at any of the times. The results are reported in Ta- 
ble 2 as well. The parameter p-values are more var- 
ied across methods than in the marginal model case. 
The most striking feature is that there is evidence 
for a treatment effect in the two groups together, 
under MAR, and also with CC. Note that the cor- 
responding 1-degree-of-freedom tests do not show 
significance. In the LOCF case some p-values are 
smaller, while others are larger. This contradicts a 
common belief that LOCF is conservative. Molen- 
berghs et al. (2004) have shown that both conserva- 
tive and liberal behavior is possible. 

4.2 Marginal versus Random- Effects Models 

In all cases, the variability of the random effect 
(standard deviation parameter a) is highly signif- 
icant. This implies that the GEE parameters and 
the random-effects parameters cannot be compared 
directly. If the conversion factor (13) is computed, 
then one roughly finds a factor of about 2.5. We 



note that this factor is not reproduced when the two 
sets of estimates (estimates not shown) are directly 
compared. This is due to the fact that (13) oper- 
ates at the true population parameter level, while 
we only have parameter estimates at our disposal. 
Since many of the estimates are only marginally or 
not significant, it is not unexpected, therefore, to ob- 
serve deviations from this relationship, even though 
the general tendency is preserved in most cases. 

4.3 Views 2 and 3: Single Time Point Analysis 

When emphasis is on the last measurement oc- 
casion, LOCF and CC are straightforward to use. 
When the last observed measurement is of interest 
(a different scientific question), the analysis is not 
different from the one obtained under LOCF, but, 
of course, in this case CC is not an option. 

Since the outcome is a dichotomous response, the 
data can be summarized in a 2 x A; table, where k 
represents the number of treatments. The analysis 
essentially consists of comparing the proportions of 
success or failure in all groups. For this purpose, 
both Pearson's chi-squared test (Agresti, 1990) and 
Fisher's exact test (Freeman and Halton, 1951) will 
be used. Nevertheless, it is still possible to obtain in- 
ferences from a full longitudinal model in this con- 
text. We add these for the sake of reference, but 
it should be understood that the analysis using a 
simple model for the last time point only is more 
in line with practice. When an ignorable analysis 
is considered, one has to explicitly consider all in- 
complete profiles in order to correctly incorporate 
all information available. Thus, one has to consider 
a longitudinal model. 

Placebo C is considered as the reference treat- 
ment. Let ai be the effect of treatment arm i at the 
last measurement occasion, where i = Al, B or C. 
We wish to test whether, at the last measurement 
occasion, all treatment effects are equal. This trans- 
lates into QAi = aB = «c or, equivalently, into qai — 
oc = Q;b — ttc = 0- Such contrasts can be obtained 
very easily using the SAS procedure NLMIXED. Ta- 
ble 3 shows a summary of the results in terms of 
p- values. 

The GLMMs lead to a small difference between 
CC and MAR: both are borderline. On the other 
hand, the GLMM for LOCF leads to a nonsignif- 
icant result. An endpoint analysis (i.e., using the 
last available measurement) shows the same result 
for LOCF (nonsignificant), whereas the result for 
CC becomes significant. An endpoint analysis leads 
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Table 4 

Age related macular degeneration trial. Parameter estimates {model-based standard errors; empirically 
corrected standard errors) for the marginal models: GEE on the CC and LOCF population, and on the 

observed data; 
in the latter case WGEE is also used 



Observed data 



Effect 


Parameter 


CC 


LOCF 


Unweighted 


WGEE 


Int. 4 


/3ii 


-1.01 (0.24;0.24) 


-0.87 (0.20;0.21) 


-0.87 (0.21; 0.21) 


-0.98 (0.10; 0.44) 


Int. 12 


/321 


-0.89 (0.24; 0.24) 


-0.97 (0.21;0.21) 


-1.01 (0.21;0.21) 


-1.78 (0.15;0.38) 


Int. 24 


/331 


-1.13 (0.25;0.25) 


-1.05 (0.21;0.21) 


-1.07 (0.22; 0.22) 


-1.11 (0.15;0.33) 


Int. 52 


Pai 


-1.64 (0.29;0.29) 


-1.51 (0.24; 0.24) 


-1.71 (0.29; 0.29) 


-1.72 (0.25;0.39) 


Trt. 4 


Pl2 


0.40 (0.32; 0.32) 


0.22 (0.28; 0.28) 


0.22 (0.28; 0.28) 


0.80 (0.15; 0.67) 


Trt. 12 


022 


0.49 (0.31;0.31) 


0.55 (0.28; 0.28) 


0.61 (0.29; 0.29) 


1.87 (0.19;0.61) 


Trt. 24 


P32 


0.48 (0.33; 0.33) 


0.42 (0.29; 0.29) 


0.44 (0.30; 0.30) 


0.73 (0.20; 0.52) 


Trt. 52 


Pi2 


0.40 (0.38; 0.38) 


0.34 (0.32; 0.32) 


0.44 (0.37; 0.37) 


0.74 (0.31; 0.52) 


Corr. 


P 


0.39 


0.44 


0.39 


0.33 



Table 3 

Depression trial, Views 2 and 3: p-values are reported 
{mixed refers to the assessment of treatment at the last visit 
based on a generalized linear mixed model) 



Method 


Model 


p- Value 


CC 


Mixed 


0.0463 




Pearson's chi-squared test 


0.0357 




Fisher's exact test 


0.0336 


LOCF 


Mixed 


0.1393 




Pearson's chi-squared test 


0.1553 




Fisher's exact test 


0.1553 


MAR 


Mixed 


0.0500 



to a completely different picture, with results that 
are strongly different (significant) from the GLMM 
model. This illustrates that the choice between mod- 
eling techniques is far from an academic question, 
but can have profound impact on the study conclu- 
sions, ranging from highly significant over borderline 
(non)significant to highly nonsignificant. 

5. ANALYSIS OF THE SECOND CASE STUDY 

We compare analyses performed on the completers 
only (CC), on the LOCF imputed data and on the 
observed data. For the observed, partially incom- 
plete data, GEE is supplemented with WGEE. Fur- 
thermore, a random-intercepts GLMM is considered, 
based on numerical integration. The GEE analyses 
are reported in Table 4 and the random-effects mod- 
els in Table 5. For GEE, a working exchangeable 
correlation matrix is considered. The model has four 
intercepts and four treatment effects. To be precise. 



the marginal regression model takes the form 

logit[P(yi, = l\Ti)] = (3ji + (3j2Ti, 

where j = 1, ... ,4 refers to measurement occasion, 
Tj is the treatment assignment for subject i = 1, . . . , 240 
and Yij is the indicator for whether or not three lines 
of vision have been lost for subject i at time j. The 
advantage of having separate treatment effects at 
each time is that particular attention can be given to 
the treatment effect assessment at the last planned 
measurement occasion (i.e., after one year). From 
Table 4 it is clear that the model-based and em- 
pirically corrected standard errors agree extremely 
well. This is due to the unstructured nature of the 

Table 5 

Age related macular degeneration trial. Parameter estimates 
{standard errors) for the random-intercept models: 
Numerical-integration-based fits {adaptive Gaussian 
quadrature) on the CC and LOCF population, 
and on the observed data {direct-likelihood) 

Direct 



Effect 


Parameter 


CC 


LOCF 


likelihood 


Int. 


4 




-1.73 (0.42) 


-1.63 


(0.39) 


-1.50 


(0.36) 


Int. 


12 




-1.53 (0.41) 


-1.80 


(0.39) 


-1.73 


(0.37) 


Int. 


24 


1331 


-1.93 (0.43) 


-1.96 


(0.40) 


-1.83 


(0.39) 


Int. 


52 


Pai 


-2.74 (0.48) 


-2.76 


(0.44) 


-2.85 


(0.47) 


Trt. 


4 


I3l2 


0.64 (0.54) 


0.38 


(0.52) 


0.34 


(0.48) 


Trt. 


12 


1322 


0.81 (0.53) 


0.98 


(0.52) 


1.00 


(0.49) 


Trt. 


24 


1332 


0.77 (0.55) 


0.74 


(0.52) 


0.69 


(0.50) 


Trt. 


52 


I3i2 


0.60 (0.59) 


0.57 


(0.56) 


0.64 


(0.58) 


R.I. 


s.d. 


T 


2.19 (0.27) 


2.47 


(0.27) 


2.20 


(0.25) 


R.I. 


var. 




4.80 (1.17) 


6.08 


(1.32) 


4.83 


(1.11) 
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full time by treatment mean structure. However, we 
do observe differences in the WGEE analyses. Not 
only are the parameter estimates mildly different be- 
tween the two GEE versions, but there is a dramatic 
difference between the model-based and empirically 
corrected standard errors. Nevertheless, the two sets 
of empirically corrected standard errors agree very 
closely, which is reassuring. 

When comparing parameter estimates across CC, 
LOCF and observed data analyses, it is clear that 
LOCF has the effect of artificially increasing the cor- 
relation between measurements. The effect is mild in 
this case. The parameter estimates of the observed- 
data GEE are close to the LOCF results for earlier 
time points and close to CC for later time points. 
This is to be expected, because at the start of the 
study the LOCF and observed populations are vir- 
tually the same, with the same holding between CC 
and observed populations near the end of the study. 
Note also that the treatment effect under LOCF, es- 
pecially at 12 weeks and after 1 year, is biased down- 
ward in comparison to the GEE analyses. To prop- 
erly use the information in the missingness process, 
WGEE can be used. To this end, a logistic regression 
for dropout, given covariates and previous outcomes, 
needs to be fitted. Parameter estimates and stan- 
dard errors are given in Table 6. Intermittent miss- 
ingness will be ignored. Covariates of importance are 
treatment assignment, the level of lesions at baseline 
(a four-point categorical variable, for which three 
dummies are needed) and time at which dropout oc- 
curs. For the latter covariates, there are three levels, 
since dropout can occur at times 2, 3 or 4. Hence, 
two dummy variables are included. Finally, the pre- 
vious outcome does not have a significant impact, 
but will be kept in the model nevertheless. In spite 
of there being no strong evidence for MAR, the re- 
sults between GEE and WGEE differ quite a bit. It 
is noteworthy that at 12 weeks, a treatment effect 
is observed with WGEE that goes unnoticed with 
the other marginal analyses. This finding is mildly 
confirmed by the random-intercept model when the 
data as observed are used. 

The results for the random-effects models are given 
in Table 5. We observe the usual relationship be- 
tween the marginal parameters of Table 4 and their 
random-effects counterparts. Note also that the 
random-intercepts variance is largest under LOCF, 
underscoring again that this method artificially in- 
creases the association between measurements on 
the same subject. In this case, unlike for the marginal 



Table 6 

Age related macular degeneration trial. 
Parameter estimates {standard errors) for a 
logistic regression model to describe dropout 



Effect 


Parameter 


Estimate (s.e.) 


Intercept 


^0 


0.14 (0.49) 


Previous outcome 




0.04 (0.38) 


Treatment 




-0.86 (0.37) 


Lesion level 1 




-1.85 (0.49) 


Lesion level 2 


1^32 


-1.91 (0.52) 


Lesion level 3 


1p33 


-2.80 (0.72) 


Time 2 


Tp41 


-1.75 (0.49) 


Time 3 


i>A2 


-1.38 (0.44) 



models, LOCF and in fact also CC slightly to consid- 
erably overestimate the treatment effect at certain 
times, in particular at 4 and 24 weeks. 

6. DISCUSSION 

In this paper we have indicated that a variety of 
approaches is possible when analyzing incomplete 
longitudinal data from clinical trials. First, unlike in 
the continuous case where the linear mixed model 
is the main mode of analysis, one has the choice 
between a marginal model (generalized estimating 
equations, GEE) and a random-effects approach (gen- 
eralized linear mixed models, GLMM). While these 
may provide similar results in terms of hypothesis 
testing, things are different when the models are 
used for estimation purposes, because the param- 
eters have quite different meanings. Both GEE and 
GLMM can be used when data are incomplete. For 
GLMM this holds under the fairly general assump- 
tion of an MAR mechanism, while for GEE the stronger 
MCAR is required. However, GEE can be extended 
to weighted GEE, making it also valid under MAR. 
Current statistical computing power has brought both 
GLMM and WGEE within reach, and we have im- 
plemented such analyses in the real-life setting of 
clinical trials on depression and on macular degener- 
ation. This underscores that simple but potentially 
highly restrictive modes of analyses, such as CC or 
LOCF, should no longer be seen as the preferred 
mode of analysis. This message is in line with the 
one reached for continuous outcomes (Molenberghs 
et al., 2004). 

While in the studies considered here there are 
no extreme differences between the various analy- 
ses conducted, some differences are noticeable, es- 
pecially in the second case study (Molenberghs et 
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al., 2004). So, generally, caution should be used and 
it is best to move away from the overly simple meth- 
ods. 

Note that such a full longitudinal approach un- 
der MAR is also very sensible, even when one is 
interested in an effect at one particular scheduled 
measurement occasion, say, the treatment effect at 
the last scheduled visit. Indeed, an ignorable anal- 
ysis takes all information into account, not only from 
complete observations, but also from incomplete ones, 
through the conditional expectation of the missing 
measurements given the observed ones. Thus, when 
combined with an analysis where the treatment al- 
location is used "as randomized" rather than "as 
treated," such an approach is fully compatible with 
the intention-to-treat principle. 

When there is residual doubt about the plausibil- 
ity of MAR, one can conduct a sensitivity analysis. 
Many proposals have been made, but this remains 
an active area of research. Obviously, a number of 
MNAR models can be fitted, provided one is prepared 
to approach formal aspects of model comparison 
with due caution. Such analyses can be complemented 
with appropriate (global and/or local) influence anal- 
yses. Some sensitivity analyses frameworks have been 
provided by Rotnitzky, Robins and Scharfstein (1998), 
Forster and Smith (1998), Raab and Donnelly (1999), 
Kenward, Goetghebeur and Molenberghs (2001), 
van Steen, Molenberghs, Verbeke and Thijs (2003) 
and Jansen et al. (2003). 
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